Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse Large Buffers in MigrateSession #623

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

vazois
Copy link
Contributor

@vazois vazois commented Aug 29, 2024

This PR tries to improve memory utilization and reduce fragmentation by reusing large buffers that were allocated across different migrate sessions.
The PR includes the following:

  • Augment the network stack to support using separate send and receive buffer size allocation by providing separate buffer pool objects
  • Declare a shared NetworkBuffers object in MigrationManager and ReplicationManager in order to re-use allocated buffers across different scenarios.
  • Utilize the shared NetworkBuffers object to allocate buffer space for managing keys that actively being migrated.
  • Added PURGEBP command. Issuing PURGEBP [MM|RM] will attempt to release any buffer not being used in LFBP of the migration manager or replication manager.
  • Added INFO BPSTATS to list information about the shared buffer pool of the migration and replication managers.

Notes:

  • There is an upper limit on the number of entries per level in the LimitedBufferPool which may cause fragmentation of the LOH due to the way we allocate and return buffers to the pool itself (shown below)

    if (Interlocked.Increment(ref pool[level].size) <= maxEntriesPerLevel)
    {
    Array.Clear(buffer.entry, 0, buffer.entry.Length);
    pool[level].items.Enqueue(buffer);
    }
    else
    Interlocked.Decrement(ref pool[level].size);
    The default limit is 16 entries which should be enough for common scenarios (i.e. up to 16 parallel migrate sessions and up to 16 replication sessions).

  • Separate buffer pool from send and receive spec.

  • Remove unused parts of NetworkSenderBase.

  • Resize allocation for send/receive of replication code.

@vazois vazois force-pushed the vazois/migration-reuse-buffers branch 4 times, most recently from 29aa345 to 146da1a Compare September 4, 2024 20:26
@vazois vazois force-pushed the vazois/migration-reuse-buffers branch 3 times, most recently from e13374e to 9d65b87 Compare September 10, 2024 02:23
@vazois vazois marked this pull request as ready for review September 10, 2024 16:35
@vazois vazois marked this pull request as draft September 10, 2024 17:24
@vazois vazois force-pushed the vazois/migration-reuse-buffers branch 3 times, most recently from 18e9c50 to d34f522 Compare September 12, 2024 00:52
@vazois vazois marked this pull request as ready for review September 12, 2024 00:52
@vazois vazois force-pushed the vazois/migration-reuse-buffers branch 2 times, most recently from 2ab3868 to c362324 Compare September 17, 2024 23:34
@vazois vazois force-pushed the vazois/migration-reuse-buffers branch from c362324 to dfc86d7 Compare September 18, 2024 17:10
Copy link
Contributor

@badrishc badrishc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

libs/cluster/Server/Failover/ReplicaFailoverSession.cs Outdated Show resolved Hide resolved
benchmark/Resp.benchmark/RespOnlineBench.cs Outdated Show resolved Hide resolved
libs/cluster/Server/Migration/MigrationManager.cs Outdated Show resolved Hide resolved
libs/cluster/Server/Replication/ReplicationManager.cs Outdated Show resolved Hide resolved
@@ -61,8 +69,8 @@ public void Return(PoolEntry buffer)
Interlocked.Decrement(ref pool[level].size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs commenting. My first thought on seeing .size was to wonder why we can't just use the queue.Count method. Now it looks like .size is total number of allocations? The PoolLevel field name comments are uninformative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the comment is here? LFBP implementation has not change. I just added Purge functionality and GetStats option. There might be a need to redesign the buffer pool but should part of another PR.

{
#if HANGDETECT
if (++count % 10000 == 0)
logger?.LogTrace("Dispose iteration {count}, {activeHandlerCount}", count, activeHandlerCount);
#endif
Thread.Yield();
}
for (int i = 0; i < numLevels; i++)
for (var i = 0; i < numLevels; i++)
{
if (pool[i] == null) continue;
while (pool[i].size > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly that .size is total # of allocations which may be > maxEntriesPerLevel, then this can spin forever if there are unReturned items. This might warrant an Assert or logging with an exit. (Holding an unReturned item and calling Dispose() (and Purge()?) is bad anyway, so let's make it easier to catch).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not change the LBFP implementation. Changes to its core operations should be part of a separate PR.

libs/common/NetworkBuffers.cs Outdated Show resolved Hide resolved
/// MigrationManager Buffer Pool
/// </summary>
MM,
/// <summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Migration, Replication, ServerSocket?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not use the word socket, as that implies TCP. the same network stack can be used with RDMA etc. Maybe ServerListener

libs/server/Resp/PurgeBPCommand.cs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants